Scalable epidemic message passing interface fault tolerance
نویسندگان
چکیده
Resilience and fault tolerance are challenging tasks in the field of high performance computing (HPC) extreme scale systems. Components fail more often such systems, results application abort. Adopting fault–tolerance techniques can be consistently detect failures continue application’s execution even if exist. A prominent parallel programming specification, message passing interface (MPI), as it would used to implement failure detection consensus algorithm this paper. Although MPI does not facilitate tolerant behavior, work presents a tolerant, matrix based algorithm. The proposed uses Gossiping. To failures, randomised pinging will applied during by using piggybacked gossip messages. In order achieve on system, failed processes’ information sent same messages all alive processes. was implemented framework is completely tolerant. exhibit process were detected global has achieved system.
منابع مشابه
Fault Tolerance in Message Passing Interface Programs
In this paper we examine the topic of writing fault-tolerant Message Passing Interface (MPI) applications. We discuss the meaning of fault tolerance in general and what the MPI Standard has to say about it. We survey several approaches to this problem, namely checkpointing, restructuring a class of standard MPI programs, modifying MPI semantics, and extending the MPI specification. We conclude ...
متن کاملRADIC-based Message Passing Fault Tolerance System
We present an analysis design of how to incorporate a transparent fault tolerance system at socket level for message passing applications. The novel design changes the default socket model avoiding being unexpectedly closed due to a remote node failure. Moreover, a pessimistic log-based rollback recovery protocol added to this level makes possible restarting and re-executing a failed parallel p...
متن کاملRecent Results on Fault-Tolerance Consensus in Message-Passing Networks
This paper surveys recent results on fault-tolerant consensus in message-passing networks. We focus on two categories of works: (i) new problem formulations (including input domain, fault model, network model...etc.), and (ii) practical applications. For the second part, we focus on Crash Fault-Tolerant (CFT) systems that use Paxos or Raft, and Byzantine Fault-Tolerant (BFT) systems. We also br...
متن کاملMpi: a Message Passing Interface
The MPI Forum This paper presents an overview of mpi, a proposed standard message passing interface for MIMD distributed memory concurrent computers. The design of mpi has been a collective eeort involving researchers in the United States and Europe from many organizations and institutions. mpi includes point-to-point and collective communication routines, as well as support for process groups,...
متن کاملMpi: a Message Passing Interface
The MPI Forum This paper presents an overview of mpi, a proposed standard message passing interface for MIMD distributed memory concurrent computers. The design of mpi has been a collective eeort involving researchers in the United States and Europe from many organizations and institutions. mpi includes point-to-point and collective communication routines, as well as support for process groups,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bulletin of Electrical Engineering and Informatics
سال: 2022
ISSN: ['2302-9285']
DOI: https://doi.org/10.11591/eei.v11i2.3374